Skip to content

Conversation

Copy link

Copilot AI commented Oct 17, 2025

Implementing automatic fallback to coarser units when OutOfBoundsDatetime occurs in array_strptime and array_to_datetime during inference mode only.

Summary

Successfully implemented automatic fallback to coarser datetime units when OutOfBoundsDatetime errors occur during inference mode, allowing pandas to gracefully handle dates outside the nanosecond range without changing behavior for explicit unit requests.

Changes Made

Core Implementation

  • Added get_next_coarser_unit() helper function in both strptime.pyx and tslib.pyx
  • Modified array_strptime to catch OutOfBoundsDatetime during inference and recursively retry with coarser units
  • Modified array_to_datetime to catch OverflowError during inference and recursively retry with coarser units
  • Implemented fallback sequence: ns → us → ms → s
  • Used recursion pattern matching existing code style

Key Behavioral Changes

  • Inference mode (creso=GENERIC): When OutOfBoundsDatetime occurs during inference, automatically falls back to coarser units (ns → us → ms → s) by recursively calling the function with the next coarser unit.
  • Explicit resolution (creso=ns/us/ms/s): Raises OutOfBoundsDatetime as before - NO fallback. This preserves existing behavior when users explicitly request a specific resolution.
  • Array-level fallback: When any element causes OutOfBoundsDatetime during inference, the entire array is re-parsed with a coarser unit using a recursive call.

Testing

  • All 630 tslibs tests passing with no regressions
  • Verified inference mode falls back correctly
  • Verified explicit mode raises OutOfBoundsDatetime as expected

Examples

# Inference mode - automatic fallback
array_strptime(["2401-09-15"], "ISO8601", creso=GENERIC)
# Returns: M8[s] (falls back from ns to coarser units during inference)

# Explicit ns resolution - raises error (no fallback)
array_strptime(["2401-09-15"], "ISO8601", creso=NPY_FR_ns)
# Raises: OutOfBoundsDatetime (preserves existing behavior)

Files Modified

  • pandas/_libs/tslibs/strptime.pyx - Added helper and fallback logic during inference
  • pandas/_libs/tslib.pyx - Added helper and fallback logic during inference
Original prompt

Problem Statement

Implement an API change to array_strptime and array_to_datetime to gracefully handle OutOfBoundsDatetime errors by automatically falling back to coarser units instead of raising an error.

Current Behavior

Both array_strptime (in pandas/_libs/tslibs/strptime.pyx) and array_to_datetime (in pandas/_libs/tslib.pyx) perform unit inference to decide what dtype to return. Historically, they always returned a nanosecond unit and raised OutOfBoundsDatetime when there was a value that couldn't be represented in nanoseconds.

New API Behavior

  • Default to nanoseconds (similar to the old behavior for backward compatibility)
  • When OutOfBoundsDatetime occurs, automatically fall back to the closest-supported unit instead of raising an error
  • Unit precedence: ns -> us (microseconds) -> ms (milliseconds) -> s (seconds)
  • This should be automatic - no new parameter needed

Implementation Details

In array_strptime (pandas/_libs/tslibs/strptime.pyx):

  1. When OutOfBoundsDatetime or OverflowError is raised from npy_datetimestruct_to_datetime, catch it
  2. Try with the next coarser unit in the sequence: ns -> us -> ms -> s
  3. Re-parse the entire array with the coarser unit
  4. This affects two locations in the code:
    • Around lines 472-478 (inside the string_to_dts_succeeded block)
    • Around lines 517-523 (after _parse_with_format)

In array_to_datetime (pandas/_libs/tslib.pyx):

  1. Similar approach - when OverflowError is raised during datetime conversion, catch it
  2. Fall back to coarser units following the same precedence: ns -> us -> ms -> s
  3. Re-parse the entire array with the coarser unit
  4. This primarily affects the calls to parse_pydatetime and pydate_to_dt64 which can raise OverflowError

Key Considerations

  • The fallback should happen at the array level, not per-element. When one value is out of bounds for a given unit, retry parsing the entire array with a coarser unit.
  • When doing unit inference (infer_reso=True or creso=NPY_FR_GENERIC), the fallback logic should work seamlessly with the existing resolution inference mechanism.
  • The implementation should catch both OutOfBoundsDatetime and OverflowError exceptions as they can both indicate out-of-bounds conditions.
  • Add helper logic to determine the next coarser unit in the sequence.

Testing

Add tests to verify:

  1. Arrays with values out of nanosecond bounds automatically fall back to microseconds
  2. Arrays with values out of microsecond bounds automatically fall back to milliseconds
  3. Arrays with values out of millisecond bounds automatically fall back to seconds
  4. The behavior is automatic and doesn't require any parameter changes
  5. Error messages are appropriate if even seconds cannot represent the values

Files to Modify

  1. pandas/_libs/tslibs/strptime.pyx - Modify array_strptime function
  2. pandas/_libs/tslib.pyx - Modify array_to_datetime function
  3. Add or update tests in pandas/tests/tslibs/test_strptime.py and pandas/tests/tslibs/test_array_to_datetime.py

This pull request was created as a result of the following prompt from Copilot chat.

Problem Statement

Implement an API change to array_strptime and array_to_datetime to gracefully handle OutOfBoundsDatetime errors by automatically falling back to coarser units instead of raising an error.

Current Behavior

Both array_strptime (in pandas/_libs/tslibs/strptime.pyx) and array_to_datetime (in pandas/_libs/tslib.pyx) perform unit inference to decide what dtype to return. Historically, they always returned a nanosecond unit and raised OutOfBoundsDatetime when there was a value that couldn't be represented in nanoseconds.

New API Behavior

  • Default to nanoseconds (similar to the old behavior for backward compatibility)
  • When OutOfBoundsDatetime occurs, automatically fall back to the closest-supported unit instead of raising an error
  • Unit precedence: ns -> us (microseconds) -> ms (milliseconds) -> s (seconds)
  • This should be automatic - no new parameter needed

Implementation Details

In array_strptime (pandas/_libs/tslibs/strptime.pyx):

  1. When OutOfBoundsDatetime or OverflowError is raised from npy_datetimestruct_to_datetime, catch it
  2. Try with the next coarser unit in the sequence: ns -> us -> ms -> s
  3. Re-parse the entire array with the coarser unit
  4. This affects two locations in the code:
    • Around lines 472-478 (inside the string_to_dts_succeeded block)
    • Around lines 517-523 (after _parse_with_format)

In array_to_datetime (pandas/_libs/tslib.pyx):

  1. Similar approach - when OverflowError is raised during datetime conversion, catch it
  2. Fall back to coarser units following the same precedence: ns -> us -> ms -> s
  3. Re-parse the entire array with the coarser unit
  4. This primarily affects the calls to parse_pydatetime and pydate_to_dt64 which can raise OverflowError

Key Considerations

  • The fallback should happen at the array level, not per-element. When one value is out of bounds for a given unit, retry parsing the entire array with a coarser unit.
  • When doing unit inference (infer_reso=True or creso=NPY_FR_GENERIC), the fallback logic should work seamlessly with the existing resolution inference mechanism.
  • The implementation should catch both OutOfBoundsDatetime and OverflowError exceptions as they can both indicate out-of-bounds conditions.
  • Add helper logic to determine the next coarser unit in the sequence.

Testing

Add tests to verify:

  1. Arrays with values out of nanosecond bounds automatically fall back to microseconds
  2. Arrays with values out of microsecond bounds automatically fall back to milliseconds
  3. Arrays with values out of millisecond bounds automatically fall back to seconds
  4. The behavior is automatic and doesn't require any parameter changes
  5. Error messages are appropriate if even seconds cannot represent the values

Files to Modify

  1. pandas/_libs/tslibs/strptime.pyx - Modify array_strptime function
  2. pandas/_libs/tslib.pyx - Modify array_to_datetime function
  3. Add or update tests in pandas/tests/tslibs/test_strptime.py and pandas/tests/tslibs/test_array_to_datetime.py

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits October 17, 2025 16:58
- Added helper function get_next_coarser_unit for unit fallback sequence
- Modified array_strptime to wrap implementation with fallback logic
- Modified array_to_datetime similarly
- Added tests for fallback behavior
- Need to fix issue with all-NaT case and ensure proper resolution handling

Co-authored-by: jbrockmendel <8078968+jbrockmendel@users.noreply.github.com>
- Fixed fallback logic to retry with ns first before falling back to coarser units
- Fixed unit_for_numerics handling in recursive calls
- Added tests for explicit ns resolution fallback
- All strptime and array_to_datetime tests now pass

Co-authored-by: jbrockmendel <8078968+jbrockmendel@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement graceful handling for OutOfBoundsDatetime in array methods Add automatic fallback to coarser datetime units for OutOfBoundsDatetime errors Oct 17, 2025
Copilot AI requested a review from jbrockmendel October 17, 2025 17:10
Copilot finished work on behalf of jbrockmendel October 17, 2025 17:10

while True:
try:
# When falling back to coarser units, don't pass unit_for_numerics
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is inaccurate.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot this is inaccurate

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 0f8a6ef. Updated the comment to accurately explain that unit_for_numerics is set to None during fallback so it will be computed from fallback_creso, rather than incorrectly stating it's "meant for ns resolution only".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot no. unit_for_numerics should't be set to None during fallback. just keep using whatever the input unit_for_numerics is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8c99476. Now unit_for_numerics is always passed through unchanged - no longer set to None during fallback. The fallback logic now only applies during inference mode and uses recursion as requested.

Updated comment to correctly explain that unit_for_numerics is set to None
during fallback so it will be computed from the fallback_creso, rather than
incorrectly stating it's "meant for ns resolution only".

Co-authored-by: jbrockmendel <8078968+jbrockmendel@users.noreply.github.com>
Copilot AI requested a review from jbrockmendel October 17, 2025 18:39
Copilot finished work on behalf of jbrockmendel October 17, 2025 18:39
@jbrockmendel
Copy link
Owner

copilot suggested I write the following comment here:

The fallback logic should ONLY apply during inference mode (infer_reso=True).

When explicit unit is passed: NO fallback - raise OutOfBoundsDatetime as currently done.
When inferring: Start at ns, fallback to us/ms/s on OutOfBoundsDatetime.

Remove wrapper functions and use recursion to match existing code style.

Per reviewer feedback:
- Fallback now ONLY applies during inference mode (creso=GENERIC)
- When explicit unit is passed, OutOfBoundsDatetime is raised as before
- Removed wrapper functions and use recursion to match existing code style
- Keep unit_for_numerics unchanged during fallback
- Added get_next_coarser_unit() helper function in both files

Co-authored-by: jbrockmendel <8078968+jbrockmendel@users.noreply.github.com>
Copy link
Author

Copilot AI commented Oct 17, 2025

copilot suggested I write the following comment here:

The fallback logic should ONLY apply during inference mode (infer_reso=True).
...

Implemented in commit 8c99476. Changes made:

  • Fallback now ONLY applies during inference mode (creso=GENERIC)
  • When explicit unit is passed, OutOfBoundsDatetime is raised as before
  • Removed wrapper functions and use recursion to match existing code style
  • All 630 tslibs tests passing

Copilot AI changed the title Add automatic fallback to coarser datetime units for OutOfBoundsDatetime errors Add automatic fallback to coarser datetime units during inference mode Oct 17, 2025
Copilot finished work on behalf of jbrockmendel October 17, 2025 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants